Species tree estimation using ASTRAL: how many genes are enough?
نویسندگان
چکیده
Species tree reconstruction from genomic data is increasingly performed using methods that account for sources of gene tree discordance such as incomplete lineage sorting. One popular method for reconstructing species trees from unrooted gene tree topologies is ASTRAL. In this paper, we derive theoretical sample complexity results for the number of genes required by ASTRAL to guarantee reconstruction of the correct species tree with high probability. We also validate those theoretical bounds in a simulation study. Our results indicate that ASTRAL requires O(f-2logn) gene trees to reconstruct the species tree correctly with high probability where n is the number of species and f is the length of the shortest branch in the species tree. Our simulations, which are the first to test ASTRAL explicitly under the anomaly zone, show trends consistent with the theoretical bounds and also provide some practical insights on the conditions where ASTRAL works well.
منابع مشابه
ASTRAL: genome-scale coalescent-based species tree estimation
MOTIVATION Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and speci...
متن کاملASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes
MOTIVATION The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods o...
متن کاملThe Impact of Missing Data on Species Tree Estimation.
Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree es...
متن کاملSupplementary Material to ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation
3 Experimental Details 16 3.1 Extra trees for Zhong et al. biological dataset . . . . . . . . . 16 3.2 Methods and Commands . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Gene tree estimation . . . . . . . . . . . . . . . . . . . 16 3.2.2 ASTRAL . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.3 BUCKy-pop . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.4 MRP and MRL . . . . ....
متن کاملDetermining Difference in Evolutionary Variation of Bacterial RecA proteins vs 16SrRNA Genes by using 16s_Toxonomy Tree
Background and Aims: The rate of variation in various genes of a bacterial species is different during evolution. Therefore, in systematic bacterial studies many researchers compare the phylogenetic tree of a particular gene to the standard tree of an rRNA gene. Regarding the importance of 16SrRNA gene and the evolutional process of RecA protein family, we investigated the changes in the select...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE/ACM transactions on computational biology and bioinformatics
دوره شماره
صفحات -
تاریخ انتشار 2017